11 research outputs found
Exhaustive Symbolic Regression
Symbolic Regression (SR) algorithms learn analytic expressions which both
accurately fit data and, unlike traditional machine-learning approaches, are
highly interpretable. Conventional SR suffers from two fundamental issues which
we address in this work. First, since the number of possible equations grows
exponentially with complexity, typical SR methods search the space
stochastically and hence do not necessarily find the best function. In many
cases, the target problems of SR are sufficiently simple that a brute-force
approach is not only feasible, but desirable. Second, the criteria used to
select the equation which optimally balances accuracy with simplicity have been
variable and poorly motivated. To address these issues we introduce a new
method for SR -- Exhaustive Symbolic Regression (ESR) -- which systematically
and efficiently considers all possible equations and is therefore guaranteed to
find not only the true optimum but also a complete function ranking. Utilising
the minimum description length principle, we introduce a principled method for
combining these preferences into a single objective statistic. To illustrate
the power of ESR we apply it to a catalogue of cosmic chronometers and the
Pantheon+ sample of supernovae to learn the Hubble rate as a function of
redshift, finding 40 functions (out of 5.2 million considered) that fit
the data more economically than the Friedmann equation. These low-redshift data
therefore do not necessarily prefer a CDM expansion history, and
traditional SR algorithms that return only the Pareto-front, even if they found
this successfully, would not locate CDM. We make our code and full
equation sets publicly available.Comment: 14 pages, 6 figures, 2 tables. Submitted to IEEE Transactions on
Pattern Analysis and Machine Intelligenc
Priors for symbolic regression
When choosing between competing symbolic models for a data set, a human will
naturally prefer the "simpler" expression or the one which more closely
resembles equations previously seen in a similar context. This suggests a
non-uniform prior on functions, which is, however, rarely considered within a
symbolic regression (SR) framework. In this paper we develop methods to
incorporate detailed prior information on both functions and their parameters
into SR. Our prior on the structure of a function is based on a -gram
language model, which is sensitive to the arrangement of operators relative to
one another in addition to the frequency of occurrence of each operator. We
also develop a formalism based on the Fractional Bayes Factor to treat
numerical parameter priors in such a way that models may be fairly compared
though the Bayesian evidence, and explicitly compare Bayesian, Minimum
Description Length and heuristic methods for model selection. We demonstrate
the performance of our priors relative to literature standards on benchmarks
and a real-world dataset from the field of cosmology.Comment: 8+2 pages, 2 figures. Submitted to The Genetic and Evolutionary
Computation Conference (GECCO) 2023 Workshop on Symbolic Regressio
The Simplest Inflationary Potentials
Inflation is a highly favoured theory for the early Universe. It is
compatible with current observations of the cosmic microwave background and
large scale structure and is a driver in the quest to detect primordial
gravitational waves. It is also, given the current quality of the data, highly
under-determined with a large number of candidate implementations. We use a new
method in symbolic regression to generate all possible simple scalar field
potentials for one of two possible basis sets of operators. Treating these as
single-field, slow-roll inflationary models we then score them with an
information-theoretic metric ("minimum description length") that quantifies
their efficiency in compressing the information in the Planck data. We explore
two possible priors on the parameter space of potentials, one related to the
functions' structural complexity and one that uses a Katz back-off language
model to prefer functions that may be theoretically motivated. This enables us
to identify the inflaton potentials that optimally balance simplicity with
accuracy at explaining the Planck data, which may subsequently find theoretical
motivation. Our exploratory study opens the door to extraction of fundamental
physics directly from data, and may be augmented with more refined theoretical
priors in the quest for a complete understanding of the early Universe.Comment: 13+4 pages, 4 figures; submitted to Physical Review
Modeling and testing screening mechanisms in the laboratory and in space
International audienceThe non-linear dynamics of scalar fields coupled to matter and gravity can lead to remarkable density-dependent screening effects. In this short review we present the main classes of screening mechanisms, and discuss their tests in laboratory and astrophysical systems. We particularly focus on reviewing numerical and technical aspects involved in modeling the non-linear dynamics of screening. In this review, we focus on tests using laboratory experiments and astrophysical systems, such as stars, galaxies and dark matter halos
No evidence for p- or d-wave dark matter annihilation from local large-scale structure
International audienceIf dark matter annihilates into standard model particles with a cross-section which is velocity dependent, then Local Group dwarf galaxies will not be the best place to search for the resulting gamma ray emission. A greater flux would be produced by more distant and massive halos, with larger velocity dispersions. We construct full-sky predictions for the gamma-ray emission from galaxy- and cluster-mass halos within using a suite of constrained -body simulations (CSiBORG) based on the Bayesian Origin Reconstruction from Galaxies algorithm. Comparing to observations from the Fermi Large Area Telescope and marginalising over reconstruction uncertainties and other astrophysical contributions to the flux, we obtain constraints on the cross-section which are two (seven) orders of magnitude tighter than those obtained from dwarf spheroidals for -wave (-wave) annihilation. We find no evidence for either type of annihilation from dark matter particles with masses in the range , for any channel. As an example, for annihilations producing bottom quarks with , we find and at 95% confidence, where the product of the cross-section, , and relative particle velocity, , is given by and for -, -wave annihilation, respectively. Our bounds, although failing to exclude the thermal relic cross-section for velocity-dependent annihilation channels, are among the tightest to date
Exhaustive Symbolic Regression
Symbolic Regression (SR) algorithms learn analytic expressions which both accurately fit data and, unlike traditional machine-learning approaches, are highly interpretable. Conventional SR suffers from two fundamental issues which we address in this work. First, since the number of possible equations grows exponentially with complexity, typical SR methods search the space stochastically and hence do not necessarily find the best function. In many cases, the target problems of SR are sufficiently simple that a brute-force approach is not only feasible, but desirable. Second, the criteria used to select the equation which optimally balances accuracy with simplicity have been variable and poorly motivated. To address these issues we introduce a new method for SR -- Exhaustive Symbolic Regression (ESR) -- which systematically and efficiently considers all possible equations and is therefore guaranteed to find not only the true optimum but also a complete function ranking. Utilising the minimum description length principle, we introduce a principled method for combining these preferences into a single objective statistic. To illustrate the power of ESR we apply it to a catalogue of cosmic chronometers and the Pantheon+ sample of supernovae to learn the Hubble rate as a function of redshift, finding 40 functions (out of 5.2 million considered) that fit the data more economically than the Friedmann equation. These low-redshift data therefore do not necessarily prefer a CDM expansion history, and traditional SR algorithms that return only the Pareto-front, even if they found this successfully, would not locate CDM. We make our code and full equation sets publicly available
On the functional form of the radial acceleration relation
International audienceWe apply a new method for learning equations from data -- Exhaustive Symbolic Regression (ESR) -- to late-type galaxy dynamics as encapsulated in the radial acceleration relation (RAR). Relating the centripetal acceleration due to baryons, , to the total dynamical acceleration, , the RAR has been claimed to manifest a new law of nature due to its regularity and tightness, in agreement with Modified Newtonian Dynamics (MOND). Fits to this relation have been restricted by prior expectations to particular functional forms, while ESR affords an exhaustive and nearly prior-free search through functional parameter space to identify the equations optimally trading accuracy with simplicity. Working with the SPARC data, we find the best functions typically satisfy at high , although the coefficient of proportionality is not clearly unity and the deep-MOND limit as is little evident at all. By generating mock data according to MOND with or without the external field effect, we find that symbolic regression would not be expected to identify the generating function or reconstruct successfully the asymptotic slopes. We conclude that the limited dynamical range and significant uncertainties of the SPARC RAR preclude a definitive statement of its functional form, and hence that this data alone can neither demonstrate nor rule out law-like gravitational behaviour
The scatter in the galaxy-halo connection: a machine learning analysis
We apply machine learning, a powerful method for uncovering complex
correlations in high-dimensional data, to the galaxy-halo connection of
cosmological hydrodynamical simulations. The mapping between galaxy and halo
variables is stochastic in the absence of perfect information, but conventional
machine learning models are deterministic and hence cannot capture its
intrinsic scatter. To overcome this limitation, we design an ensemble of neural
networks with a Gaussian loss function that predict probability distributions,
allowing us to model statistical uncertainties in the galaxy-halo connection as
well as its best-fit trends. We extract a number of galaxy and halo variables
from the Horizon-AGN and IllustrisTNG100-1 simulations and quantify the extent
to which knowledge of some subset of one enables prediction of the other. This
allows us to identify the key features of the galaxy-halo connection and
investigate the origin of its scatter in various projections. We find that
while halo properties beyond mass account for up to 50 per cent of the scatter
in the halo-to-stellar mass relation, the prediction of stellar half-mass
radius or total gas mass is not substantially improved by adding further halo
properties. We also use these results to investigate semi-analytic models for
galaxy size in the two simulations, finding that assumptions relating galaxy
size to halo size or spin are not successful.Comment: 20 pages, 11 figures. Accepted in MNRA
Constraints on dark matter annihilation and decay from the large-scale structure of the nearby universe
Decaying or annihilating dark matter particles could be detected through
gamma-ray emission from the species they decay or annihilate into. This is
usually done by modelling the flux from specific dark matter-rich objects such
as the Milky Way halo, Local Group dwarfs and nearby groups. However, these
objects are expected to have significant emission from baryonic processes as
well, and the analyses discard gamma-ray data over most of the sky. Here we
construct full-sky templates for gamma-ray flux from the large-scale structure
within 200 Mpc by means of a suite of constrained -body simulations
(CSiBORG) produced using the Bayesian Origin Reconstruction from Galaxies
algorithm. Marginalising over uncertainties in this reconstruction, small-scale
structure and parameters describing astrophysical contributions to the observed
gamma ray sky, we compare to observations from the Fermi Large Area Telescope
to constrain dark matter annihilation cross-sections and decay rates through a
Markov Chain Monte Carlo analysis. We rule out the thermal relic cross-section
for -wave annihilation for all at 95%
confidence if the annihilation produces bosons, gluons or quarks less
massive than the bottom quark. We infer a contribution to the gamma ray sky
with the same spatial distribution as dark matter decay at .
Although this could be due to dark matter decay via these channels with a decay
rate , we find that a
power-law spectrum of index , likely of baryonic
origin, is preferred by the data.Comment: 23 pages, 9 figures, 1 table. Submitted to Physical Review